A Novel Indexing Technique for Web Documents using Hierarchical Clustering

نویسندگان

  • Deepti Gupta
  • Komal Kumar Bhatia
  • A. K. Sharma
چکیده

The information on the WWW is growing at an exponential rate; therefore, search engines are required to index the downloaded Web documents more efficiently. Web mining techniques like clustering can be used for this purpose. In this paper, a novel technique to index the documents is being proposed that not only indexes the documents more efficiently but also uses hierarchical clustering to keep the information based upon similarity measure and fuzzy string matching. This technique keeps the related documents in the same cluster so that searching of documents becomes more efficient in terms of time complexity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining

Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possib...

متن کامل

Fuzzy clustering for indexing in the GAMBAL information retrieval system

Gambal is an information retrieval system for indexing and accessing web pages that includes graphical interfaces to ease web page search and accessing. In particular, the interfaces provide the user with tools for navigating through hierarchies of documents and visualize selected documents and similar ones. Here, similarity is either based on Wordnet 1.7 or Latent Semantics Analysis. Graphical...

متن کامل

Retrieval of Web Documents Using a Fuzzy Hierarchical Clustering

The World Wide Web has huge amount of information that is retrieved using information retrieval tool like Search Engine. Page repository of Search Engine contains the web documents downloaded by the crawler. This repository contains variety of web documents from different domains. In this paper, a technique called “Retrieval of Web documents using a fuzzy hierarchical clustering” is being propo...

متن کامل

Web Document Clustering Using Fuzzy Equivalence Relations

Conventional clustering means classifying the given data objects as exclusive subsets (clusters).That means we can discriminate clearly whether an object belongs to a cluster or not. However such a partition is insufficient to represent many real situations. Therefore a fuzzy clustering method is offered to construct clusters with uncertain boundaries and allows that one object belongs to overl...

متن کامل

Hierarchical Summarizing and Evaluating for Web Pages

In this investigation we propose a novel summarization method of Web pages using hierarchical expression. We discuss close relationship between summarization and hierarchical clustering to obtain the results, and we examine how to evaluate hierarchical summarization based on both correlation and structural aspects. We describe some experimental results using NTCIR Web documents to examine our m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009